Journal of Educational Evaluation for Health Professions

Research articles

Development of a character qualities test for medical students in Korea using polytomous item response theory and factor analysis: a preliminary scale development study: Yera Hur, Dong Gi Seo; J Educ Eval Health Prof. 2023;20:20. Published online June 26, 2023; DOI: https://doi.org/10.3352/jeehp.2023.20.20

1,285 View
101 Download

Abstract PDF Supplementary Material: Purpose
This study aimed to develop a test scale to measure the character qualities of medical students as a follow-up study on the 8 core character qualities revealed in a previous report.
Methods
In total, 160 preliminary items were developed to measure 8 core character qualities. Twenty questions were assigned to each quality, and a questionnaire survey was conducted among 856 students in 5 medical schools in Korea. Using the partial credit model, polytomous item response theory analysis was carried out to analyze the goodness-of-fit, followed by exploratory factor analysis. Finally, confirmatory factor and reliability analyses were conducted with the final selected items.
Results
The preliminary items for the 8 core character qualities were administered to the participants. Data from 767 students were included in the final analysis. Of the 160 preliminary items, 25 were removed by classical test theory analysis and 17 more by polytomous item response theory assessment. A total of 118 items and sub-factors were selected for exploratory factor analysis. Finally, 79 items were selected, and the validity and reliability were confirmed through confirmatory factor analysis and intra-item relevance analysis.
Conclusion
The character qualities test scale developed through this study can be used to measure the character qualities corresponding to the educational goals and visions of individual medical schools in Korea. Furthermore, this measurement tool can serve as primary data for developing character qualities tools tailored to each medical school’s vision and educational goals.

The accuracy and consistency of mastery for each content domain using the Rasch and deterministic inputs, noisy “and” gate diagnostic classification models: a simulation study and a real-world analysis using data from the Korean Medical Licensing Examination: Dong Gi Seo, Jae Kum Kim; J Educ Eval Health Prof. 2021;18:15. Published online July 5, 2021; DOI: https://doi.org/10.3352/jeehp.2021.18.15

4,326 View
288 Download
2 Web of Science
2 Crossref

Purpose
Diagnostic classification models (DCMs) were developed to identify the mastery or non-mastery of the attributes required for solving test items, but their application has been limited to very low-level attributes, and the accuracy and consistency of high-level attributes using DCMs have rarely been reported compared with classical test theory (CTT) and item response theory models. This paper compared the accuracy of high-level attribute mastery between deterministic inputs, noisy “and” gate (DINA) and Rasch models, along with sub-scores based on CTT.
Methods
First, a simulation study explored the effects of attribute length (number of items per attribute) and the correlations among attributes with respect to the accuracy of mastery. Second, a real-data study examined model and item fit and investigated the consistency of mastery for each attribute among the 3 models using the 2017 Korean Medical Licensing Examination with 360 items.
Results
Accuracy of mastery increased with a higher number of items measuring each attribute across all conditions. The DINA model was more accurate than the CTT and Rasch models for attributes with high correlations (>0.5) and few items. In the real-data analysis, the DINA and Rasch models generally showed better item fits and appropriate model fit. The consistency of mastery between the Rasch and DINA models ranged from 0.541 to 0.633 and the correlations of person attribute scores between the Rasch and DINA models ranged from 0.579 to 0.786.
Conclusion
Although all 3 models provide a mastery decision for each examinee, the individual mastery profile using the DINA model provides more accurate decisions for attributes with high correlations than the CTT and Rasch models. The DINA model can also be directly applied to tests with complex structures, unlike the CTT and Rasch models, and it provides different diagnostic information from the CTT and Rasch models.

Citations

Citations to this article as recorded by

Stable Knowledge Tracing Using Causal Inference
Jia Zhu, Xiaodong Ma, Changqin Huang
IEEE Transactions on Learning Technologies.2024; 17: 124. CrossRef
Development of a character qualities test for medical students in Korea using polytomous item response theory and factor analysis: a preliminary scale development study
Yera Hur, Dong Gi Seo
Journal of Educational Evaluation for Health Professions.2023; 20: 20. CrossRef

Estimation of item parameters and examinees’ mastery probability in each domain of the Korean Medical Licensing Examination using a deterministic inputs, noisy “and” gate (DINA) model: Younyoung Choi, Dong Gi Seo; J Educ Eval Health Prof. 2020;17:35. Published online November 17, 2020; DOI: https://doi.org/10.3352/jeehp.2020.17.35

4,784 View
97 Download

Abstract PDF Supplementary Material: Purpose
The deterministic inputs, noisy “and” gate (DINA) model is a promising statistical method for providing useful diagnostic information about students’ level of achievement, as educators often want to receive diagnostic information on how examinees did on each content strand, which is referred to as a diagnostic profile. The purpose of this paper was to classify examinees of the Korean Medical Licensing Examination (KMLE) in different content domains using the DINA model.
Methods
This paper analyzed data from the KMLE, with 360 items and 3,259 examinees. An application study was conducted to estimate examinees’ parameters and item characteristics. The guessing and slipping parameters of each item were estimated, and statistical analysis was conducted using the DINA model.
Results
The output table shows examples of some items that can be used to check item quality. The probabilities of mastery of each content domain were also estimated, indicating the mastery profile of each examinee. The classification accuracy and consistency for 8 content domains ranged from 0.849 to 0.972 and from 0.839 to 0.994, respectively. As a result, the classification reliability of the cognitive diagnosis model was very high for the 8 content domains of the KMLE.
Conclusion
This mastery profile can provide useful diagnostic information for each examinee in terms of each content domain of the KMLE. Individual mastery profiles allow educators and examinees to understand which domain(s) should be improved in order to master all domains in the KMLE. In addition, all items showed reasonable results in terms of item parameters.

Software report

Introduction to the LIVECAT web-based computerized adaptive testing platform: Dong Gi Seo, Jeongwook Choi; J Educ Eval Health Prof. 2020;17:27. Published online September 29, 2020; DOI: https://doi.org/10.3352/jeehp.2020.17.27

5,340 View
131 Download
3 Web of Science
3 Crossref

Abstract PDF Supplementary Material

This study introduces LIVECAT, a web-based computerized adaptive testing platform. This platform provides many functions, including writing item content, managing an item bank, creating and administering a test, reporting test results, and providing information about a test and examinees. The LIVECAT provides examination administrators with an easy and flexible environment for composing and managing examinations. It is available at http://www.thecatkorea.com/. Several tools were used to program LIVECAT, as follows: operating system, Amazon Linux; web server, nginx 1.18; WAS, Apache Tomcat 8.5; database, Amazon RDMS—Maria DB; and languages, JAVA8, HTML5/CSS, Javascript, and jQuery. The LIVECAT platform can be used to implement several item response theory (IRT) models such as the Rasch and 1-, 2-, 3-parameter logistic models. The administrator can choose a specific model of test construction in LIVECAT. Multimedia data such as images, audio files, and movies can be uploaded to items in LIVECAT. Two scoring methods (maximum likelihood estimation and expected a posteriori) are available in LIVECAT and the maximum Fisher information item selection method is applied to every IRT model in LIVECAT. The LIVECAT platform showed equal or better performance compared with a conventional test platform. The LIVECAT platform enables users without psychometric expertise to easily implement and perform computerized adaptive testing at their institutions. The most recent LIVECAT version only provides a dichotomous item response model and the basic components of CAT. Shortly, LIVECAT will include advanced functions, such as polytomous item response models, weighted likelihood estimation method, and content balancing method.

Citations

Citations to this article as recorded by

Presidential address: improving item validity and adopting computer-based testing, clinical skills assessments, artificial intelligence, and virtual reality in health professions licensing examinations in Korea
Hyunjoo Pai
Journal of Educational Evaluation for Health Professions.2023; 20: 8. CrossRef
Patient-reported outcome measures in cancer care: Integration with computerized adaptive testing
Minyu Liang, Zengjie Ye
Asia-Pacific Journal of Oncology Nursing.2023; 10(12): 100323. CrossRef
Development of a character qualities test for medical students in Korea using polytomous item response theory and factor analysis: a preliminary scale development study
Yera Hur, Dong Gi Seo
Journal of Educational Evaluation for Health Professions.2023; 20: 20. CrossRef

Corrigendum

Funding information of the article entitled “Post-hoc simulation study of computerized adaptive testing for the Korean Medical Licensing Examination”: Dong Gi Seo, Jeongwook Choi; J Educ Eval Health Prof. 2018;15:27. Published online December 4, 2018; DOI: https://doi.org/10.3352/jeehp.2018.15.27 [Epub ahead of print]; Corrects: J Educ Eval Health Prof 2018;15(0):14

17,059 View
176 Download

PDF

Research articles

Linear programming method to construct equated item sets for the implementation of periodical computer-based testing for the Korean Medical Licensing Examination: Dong Gi Seo, Myeong Gi Kim, Na Hui Kim, Hye Sook Shin, Hyun Jung Kim; J Educ Eval Health Prof. 2018;15:26. Published online October 18, 2018; DOI: https://doi.org/10.3352/jeehp.2018.15.26

20,586 View
278 Download
2 Web of Science
2 Crossref

Abstract PDF Supplementary Material

Purpose
This study aimed to identify the best way of developing equivalent item sets and to propose a stable and effective management plan for periodical licensing examinations.
Methods
Five pre-equated item sets were developed based on the predicted correct answer rate of each item using linear programming. These pre-equated item sets were compared to the ones that were developed with a random item selection method based on the actual correct answer rate (ACAR) and difficulty from item response theory (IRT). The results with and without common items were also compared in the same way. ACAR and the IRT difficulty were used to determine whether there was a significant difference between the pre-equating conditions.
Results
There was a statistically significant difference in IRT difficulty among the results from different pre-equated conditions. The predicted correct answer rate was divided using 2 or 3 difficulty categories, and the ACAR and IRT difficulty parameters of the 5 item sets were equally constructed. Comparing the item set conditions with and without common items, including common items did not make a significant contribution to the equating of the 5 item sets.
Conclusion
This study suggested that the linear programming method is applicable to construct equated-item sets that reflect each content area. The suggested best method to construct equated item sets is to divide the predicted correct answer rate using 2 or 3 difficulty categories, regardless of common items. If pre-equated item sets are required to construct a test based on the actual data, several methods should be considered by simulation studies to determine which is optimal before administering a real test.

Citations

Citations to this article as recorded by

Application of computer-based testing in the Korean Medical Licensing Examination, the emergence of the metaverse in medical education, journal metrics and statistics, and appreciation to reviewers and volunteers
Sun Huh
Journal of Educational Evaluation for Health Professions.2022; 19: 2. CrossRef
Reading Comprehension Tests for Children: Test Equating and Specific Age-Interval Reports
Patrícia Silva Lúcio, Fausto Coutinho Lourenço, Hugo Cogo-Moreira, Deborah Bandalos, Carolina Alves Ferreira de Carvalho, Adriana de Souza Batista Kida, Clara Regina Brandão de Ávila
Frontiers in Psychology.2021;[Epub] CrossRef

Post-hoc simulation study of computerized adaptive testing for the Korean Medical Licensing Examination: Dong Gi Seo, Jeongwook Choi; J Educ Eval Health Prof. 2018;15:14. Published online May 17, 2018; DOI: https://doi.org/10.3352/jeehp.2018.15.14; Correction in: J Educ Eval Health Prof 2018;15(0):27

36,348 View
321 Download
8 Web of Science
7 Crossref

Abstract PDF Supplementary Material

Purpose
Computerized adaptive testing (CAT) has been adopted in licensing examinations because it improves the efficiency and accuracy of the tests, as shown in many studies. This simulation study investigated CAT scoring and item selection methods for the Korean Medical Licensing Examination (KMLE).
Methods
This study used a post-hoc (real data) simulation design. The item bank used in this study included all items from the January 2017 KMLE. All CAT algorithms for this study were implemented using the ‘catR’ package in the R program.
Results
In terms of accuracy, the Rasch and 2-parametric logistic (PL) models performed better than the 3PL model. The ‘modal a posteriori’ and ‘expected a posterior’ methods provided more accurate estimates than maximum likelihood estimation or weighted likelihood estimation. Furthermore, maximum posterior weighted information and minimum expected posterior variance performed better than other item selection methods. In terms of efficiency, the Rasch model is recommended to reduce test length.
Conclusion
Before implementing live CAT, a simulation study should be performed under varied test conditions. Based on a simulation study, and based on the results, specific scoring and item selection methods should be predetermined.

Citations

Citations to this article as recorded by

Presidential address: improving item validity and adopting computer-based testing, clinical skills assessments, artificial intelligence, and virtual reality in health professions licensing examinations in Korea
Hyunjoo Pai
Journal of Educational Evaluation for Health Professions.2023; 20: 8. CrossRef
Developing Computerized Adaptive Testing for a National Health Professionals Exam: An Attempt from Psychometric Simulations
Lingling Xu, Zhehan Jiang, Yuting Han, Haiying Liang, Jinying Ouyang
Perspectives on Medical Education.2023;[Epub] CrossRef
Optimizing Computer Adaptive Test Performance: A Hybrid Simulation Study to Customize the Administration Rules of the CAT-EyeQ in Macular Edema Patients
T. Petra Rausch-Koster, Michiel A. J. Luijten, Frank D. Verbraak, Ger H. M. B. van Rens, Ruth M. A. van Nispen
Translational Vision Science & Technology.2022; 11(11): 14. CrossRef
The accuracy and consistency of mastery for each content domain using the Rasch and deterministic inputs, noisy “and” gate diagnostic classification models: a simulation study and a real-world analysis using data from the Korean Medical Licensing Examinat
Dong Gi Seo, Jae Kum Kim
Journal of Educational Evaluation for Health Professions.2021; 18: 15. CrossRef
Linear programming method to construct equated item sets for the implementation of periodical computer-based testing for the Korean Medical Licensing Examination
Dong Gi Seo, Myeong Gi Kim, Na Hui Kim, Hye Sook Shin, Hyun Jung Kim
Journal of Educational Evaluation for Health Professions.2018; 15: 26. CrossRef
Funding information of the article entitled “Post-hoc simulation study of computerized adaptive testing for the Korean Medical Licensing Examination”
Dong Gi Seo, Jeongwook Choi
Journal of Educational Evaluation for Health Professions.2018; 15: 27. CrossRef
Updates from 2018: Being indexed in Embase, becoming an affiliated journal of the World Federation for Medical Education, implementing an optional open data policy, adopting principles of transparency and best practice in scholarly publishing, and appreci
Sun Huh
Journal of Educational Evaluation for Health Professions.2018; 15: 36. CrossRef

Usefulness of the DETECT program for assessing the internal structure of dimensionality in simulated data and results of the Korean nursing licensing examination: Dong Gi Seo, Younyoung Choi, Sun Huh; J Educ Eval Health Prof. 2017;14:32. Published online December 27, 2017; DOI: https://doi.org/10.3352/jeehp.2017.14.32

25,319 View
262 Download
3 Web of Science
4 Crossref

Abstract PDF Supplementary Material

Purpose
The dimensionality of examinations provides empirical evidence of the internal test structure underlying the responses to a set of items. In turn, the internal structure is an important piece of evidence of the validity of an examination. Thus, the aim of this study was to investigate the performance of the DETECT program and to use it to examine the internal structure of the Korean nursing licensing examination.
Methods
Non-parametric methods of dimensional testing, such as the DETECT program, have been proposed as ways of overcoming the limitations of traditional parametric methods. A non-parametric method (the DETECT program) was investigated using simulation data under several conditions and applied to the Korean nursing licensing examination.
Results
The DETECT program performed well in terms of determining the number of underlying dimensions under several different conditions in the simulated data. Further, the DETECT program correctly revealed the internal structure of the Korean nursing licensing examination, meaning that it detected the proper number of dimensions and appropriately clustered the items within each dimension.
Conclusion
The DETECT program performed well in detecting the number of dimensions and in assigning items for each dimension. This result implies that the DETECT method can be useful for examining the internal structure of assessments, such as licensing examinations, that possess relatively many domains and content areas.

Citations

Citations to this article as recorded by

Meanings of Rough Sex across Gender, Sexual Identity, and Political Ideology: A Conditional Covariance Approach
Dubravka Svetina Valdivia, Debby Herbenick, Tsung-chieh Fu, Heather Eastman-Mueller, Lucia Guerra-Reyes, Molly Rosenberg
Journal of Sex & Marital Therapy.2022; 48(6): 579. CrossRef
The accuracy and consistency of mastery for each content domain using the Rasch and deterministic inputs, noisy “and” gate diagnostic classification models: a simulation study and a real-world analysis using data from the Korean Medical Licensing Examinat
Dong Gi Seo, Jae Kum Kim
Journal of Educational Evaluation for Health Professions.2021; 18: 15. CrossRef
Estimation of item parameters and examinees’ mastery probability in each domain of the Korean Medical Licensing Examination using a deterministic inputs, noisy “and” gate (DINA) model
Younyoung Choi, Dong Gi Seo
Journal of Educational Evaluation for Health Professions.2020; 17: 35. CrossRef
Linear programming method to construct equated item sets for the implementation of periodical computer-based testing for the Korean Medical Licensing Examination
Dong Gi Seo, Myeong Gi Kim, Na Hui Kim, Hye Sook Shin, Hyun Jung Kim
Journal of Educational Evaluation for Health Professions.2018; 15: 26. CrossRef

Review article

Overview and current management of computerized adaptive testing in licensing/certification examinations: Dong Gi Seo; J Educ Eval Health Prof. 2017;14:17. Published online July 26, 2017; DOI: https://doi.org/10.3352/jeehp.2017.14.17

38,864 View
371 Download
9 Web of Science
10 Crossref

Abstract PDF

Computerized adaptive testing (CAT) has been implemented in high-stakes examinations such as the National Council Licensure Examination-Registered Nurses in the United States since 1994. Subsequently, the National Registry of Emergency Medical Technicians in the United States adopted CAT for certifying emergency medical technicians in 2007. This was done with the goal of introducing the implementation of CAT for medical health licensing examinations. Most implementations of CAT are based on item response theory, which hypothesizes that both the examinee and items have their own characteristics that do not change. There are 5 steps for implementing CAT: first, determining whether the CAT approach is feasible for a given testing program; second, establishing an item bank; third, pretesting, calibrating, and linking item parameters via statistical analysis; fourth, determining the specification for the final CAT related to the 5 components of the CAT algorithm; and finally, deploying the final CAT after specifying all the necessary components. The 5 components of the CAT algorithm are as follows: item bank, starting item, item selection rule, scoring procedure, and termination criterion. CAT management includes content balancing, item analysis, item scoring, standard setting, practice analysis, and item bank updates. Remaining issues include the cost of constructing CAT platforms and deploying the computer technology required to build an item bank. In conclusion, in order to ensure more accurate estimations of examinees’ ability, CAT may be a good option for national licensing examinations. Measurement theory can support its implementation for high-stakes examinations.

Citations

Citations to this article as recorded by

Validation of the cognitive section of the Penn computerized adaptive test for neurocognitive and clinical psychopathology assessment (CAT-CCNB)
Akira Di Sandro, Tyler M. Moore, Eirini Zoupou, Kelly P. Kennedy, Katherine C. Lopez, Kosha Ruparel, Lucky J. Njokweni, Sage Rush, Tarlan Daryoush, Olivia Franco, Alesandra Gorgone, Andrew Savino, Paige Didier, Daniel H. Wolf, Monica E. Calkins, J. Cobb S
Brain and Cognition.2024; 174: 106117. CrossRef
The current utilization of the patient-reported outcome measurement information system (PROMIS) in isolated or combined total knee arthroplasty populations
Puneet Gupta, Natalia Czerwonka, Sohil S. Desai, Alirio J. deMeireles, David P. Trofa, Alexander L. Neuwirth
Knee Surgery & Related Research.2023;[Epub] CrossRef
Evaluating a Computerized Adaptive Testing Version of a Cognitive Ability Test Using a Simulation Study
Ioannis Tsaousis, Georgios D. Sideridis, Hannan M. AlGhamdi
Journal of Psychoeducational Assessment.2021; 39(8): 954. CrossRef
Accuracy and Efficiency of Web-based Assessment Platform (LIVECAT) for Computerized Adaptive Testing
Do-Gyeong Kim, Dong-Gi Seo
The Journal of Korean Institute of Information Technology.2020; 18(4): 77. CrossRef
Transformaciones en educación médica: innovaciones en la evaluación de los aprendizajes y avances tecnológicos (parte 2)
Veronica Luna de la Luz, Patricia González-Flores
Investigación en Educación Médica.2020; 9(34): 87. CrossRef
Introduction to the LIVECAT web-based computerized adaptive testing platform
Dong Gi Seo, Jeongwook Choi
Journal of Educational Evaluation for Health Professions.2020; 17: 27. CrossRef
Computerised adaptive testing accurately predicts CLEFT-Q scores by selecting fewer, more patient-focused questions
Conrad J. Harrison, Daan Geerards, Maarten J. Ottenhof, Anne F. Klassen, Karen W.Y. Wong Riff, Marc C. Swan, Andrea L. Pusic, Chris J. Sidey-Gibbons
Journal of Plastic, Reconstructive & Aesthetic Surgery.2019; 72(11): 1819. CrossRef
Presidential address: Preparing for permanent test centers and computerized adaptive testing
Chang Hwi Kim
Journal of Educational Evaluation for Health Professions.2018; 15: 1. CrossRef
Updates from 2018: Being indexed in Embase, becoming an affiliated journal of the World Federation for Medical Education, implementing an optional open data policy, adopting principles of transparency and best practice in scholarly publishing, and appreci
Sun Huh
Journal of Educational Evaluation for Health Professions.2018; 15: 36. CrossRef
Linear programming method to construct equated item sets for the implementation of periodical computer-based testing for the Korean Medical Licensing Examination
Dong Gi Seo, Myeong Gi Kim, Na Hui Kim, Hye Sook Shin, Hyun Jung Kim
Journal of Educational Evaluation for Health Professions.2018; 15: 26. CrossRef

First
Prev
Page of 1
Next
Last

Search